In authoring text, writers occasionally incorrectly use one word where another would be correct. For instance, a writer might author the following sentence, using the word "add" where "ad" would be correct:
The add convinced people.
Word pairs like "add" and "ad" that are consistently mistaken for one another are said to be commonly confused. Commonly confused words often have similar pronunciations (e.g., "advise" vs. "advice") or differ by the transposition of a few letters (e.g., "from" vs. "form"). In the above example, the word "ad" is said to be the "intended word," i.e., the word intended by the author, while the word "add" is said to be the "confused word," i.e., the word that the author has mistakenly substituted for the intended word.
When a sentence contains a confused word, natural language parsers have difficulty parsing the sentence. A natural language parser analyzes sentences of a natural language to discern the lexical and syntactic content of the sentences. For example, a chart-based natural language parser retrieves a dictionary entry from a dictionary for each word in the input sentence. The dictionary entry contains a lexical record containing general information about the word and referencing part-of-speech records that each contain information specific to a particular part of speech that the word may represent. The parser places one or more of the part-of-speech records into a working area called chart, where they are subjected to parsing rules that combine part-of-speech records into larger syntactic units, and ultimately a sentence.
When a natural language parser is used to parse a sentence containing a confused word that does not have the part of speech intended by the author for the intended word, the natural language parser is unable to produce a complete parse of the sentence. Because the purpose of natural language parsers is to produce complete parses accurately representing the intended lexical and syntactic content of input sentences, a natural language parser that is able to produce a complete parse of a sentence containing a confused word is desirable.